arXiv: 1503.04045v3 [cs.FL] 16 Feb 2017 


February 17, 2017 2:25 WSPC/INSTRUCTION FILE diverse'final 


International Journal of Foundations of Computer Science 
(c) World Scientific Publishing Company 


Diverse Palindromic Factorization is NP-Complete 


Hideo Bannai 

Department of Informatics, Kyushu University, Japan 
bannai@inf.kyushu-u.ac.jp 

Travis Gagie 

Diego Portales University, Chile 
travis.gagie@mail.udp. cl 

Shunsuke Inenaga 

Department of Informatics, Kyushu University, Japan 
inenaga@inf. kyushu-u. ac.jp 

Juha Karkkainen and Dominik Kempa 

Department of Computer Science and Helsinki Institute for Information Technology HIIT, 

University of Helsinki, Finland 
juha.karkkainen@cs.helsinki.fi, dominik. kempa@cs.helsinki.fi 

Marcin Pi^tkowski 

Faculty of Mathematics and Computer Science, Nicolaus Copernicus University, Poland 
marcin.piatkowski@mat.umk.pl 

Shiho Sugimoto 

Department of Informatics, Kyushu University, Japan 
shiho.sugimoto@inf.kyushu-u.ac.jp 

Received (Day Month Year) 

Accepted (Day Month Year) 

Communicated by (xxxxxxxxxx) 


We prove that it is NP-complete to decide whether a given string can be factored into 
palindromes that are each unique in the factorization. 
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1. Introduction 

Given a string (or word) S = 5'[l..n] = 5'[1]5'[2]... S'[n] of n symbols (or characters) 
drawn from an alphabet E, a factorization of S partitions S into substrings (or 
factors) Fi,F 2 , ... Ft, such that S = F 1 F 2 ... Ft- Several papers have appeared 
recently on the subject of palindromic factorization; that is, factorizations where 
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2 Bannai et al. 

every factor is a palindrome. For example, a palindromic factorization of the 10- 
symbol string S = abaaaaabaa would be aba, aa, aabaa. 

The palindromic length of a string is the minimum number of palindromic sub¬ 
strings into which the string can be factored. Notice that, since a single symbol is 
a palindrome, the palindromic length of a string is always defined and at most the 
length of the string. For our example string above, abaaaaaba, a is the palindromic 
factorization of minimum length. Ravsky |14j proved a tight bound on the maximum 
palindromic length of a binary string in terms of its length. Frid, Puzynina, and 
Zamboni [7] conjectured that any infinite string in which the palindromic length 
of any finite substring is bounded, is ultimately periodic. Their work led other re¬ 
searchers to consider how to efficiently compute a string’s palindromic length and 
give a minimum palindromic factorization. It is not difficult to design a quadratic¬ 
time algorithm that uses linear space, but doing better than that seems to require 
some string combinatorics. 

Alatabbi, Iliopoulos and Rahman [1] first gave a linear-time algorithm for com¬ 
puting a minimum factorization into maximal palindromes, if such a factorization 
exists. Notice that abaca cannot be factored into maximal palindromes, for ex¬ 
ample, because its maximal palindromes are a, aba, a, aca and a. Fici, Gagie, 
Karkkainen and Kempa [6] and I, Sugimoto, Inenaga, Bannai and Takeda m 
independently then described essentially the same C>(nlogn)-time algorithm for 
computing a minimum palindromic factorization. Shortly thereafter, Kosolobov, 
Rubinchik and Shur m gave an algorithm for recognizing strings with a given 
palindromic length. Their result can be used to compute the palindromic length 
£ of a string of length n in 0{n£logi) time. We also note that Gawrychowski, 
Merkurev, Shur and Uznanski used similar techniques as Fici et al. and I et ah, 
for finding approximately the longest palindrome in a stream. 

We call a factorization diverse if each of the factors is unique. Some well-known 
factorizations, such as the LZ77 m and LZ78 [18] parses, are diverse (except that 
the last factor may have appeared before). Fernau, Manea, Merca§ and Schmid |5] 
recently proved that it is NP-complete to determine whether a given string has 
a diverse factorization of at least a given size, and Schmid m has investigated 
related questions. It seems natural to consider the problem of determining whether 
a given string has a diverse factorization into palindromes. For example, bgikkpps 
and feglfcpsp/c each have exactly one such factorization — i.e., (6, g, i, kk, pp, s) and 
(b, g, i, kpspk), respectively — but bgkpispk has none. This problem is obviously 
in NP and in this paper we prove that it is NP-hard and, thus, NP-complete. 

We also show — proving a conjecture from the conference version of this paper |2] 
— that it is NP-complete for any fixed k to decide whether a given string can be 
factored into palindromes that each appear at most k times in the factorization; we 
call such a factorization k-diverse. Finally, since several recent papers (e.g., mSM) 
consider the effect of alphabet size on the difficulty of various string problems, we 
show that the problems remain NP-complete even if the string is restricted to be 
binary. 
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NOT 


AND 
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■<o^ ^><o^ 



Fig. 1. Construction of NOT, AND and OR gates using NAND gates. 


2. Outline 

In complexity theory, a Boolean circuit is formally a directed acyclic graph in which 
each node is either a source or one of a specified set of logic gates. The gates are 
usually AND, OR and NOT, with AND and OR gates each having in-degree at least 
2 and NOT gates each having in-degree I. A gate’s predecessors and successors are 
called its inputs and outputs, and sources and sinks are called the circuit’s inputs 
and outputs. A circuit with a single output is said to be satisfiable if and only if 
it is possible to assign each gate a value true or false such that the output is true 
and all the gates’ semantics are respected: e.g., each AND gate is true if and only if 
all its inputs are true, each OR gate is true if and only if at least one of its inputs 
is true, and each NOT gate is true if and only if its unique input is false. Notice 
that with these semantics, a truth assignment to the circuit’s inputs determines the 
truth values of all the gates. 

The circuit satisfiability problem [T^ (see also, e.g., m is to determine whether 
a given single-output Boolean circuit C is satisfiable. It was one of the first problems 
proven NP-complete and is often the first such problem taught in undergraduate 
courses. We will show how to build, in time linear in the size of C, a string that 
has a diverse palindromic factorization if and only if C is satisfiable. It follows that 
diverse palindromic factorization is also NP-hard. Our construction is similar to the 
Tseitin Transform m from Boolean circuits to CNF formulas. 

We can make each AND or OR gate’s in-degree 2 and each gate’s out-degree 1 at 
the cost of at most a logarithmic increase in the size and depth of the circuit, using 
splitter gates with one input and two outputs that should have the same truth value 
as the input. A NAND gate is true if and only if at least one of its inputs is false. 
AND, OR and NOT gates can be implemented with a constant number of NAND 
gates (see Fig. [T]), so we assume without loss of generality that C is composed only 
of NAND gates with two inputs and one output each and splitter gates. Boolean 
circuits are a model for real circuits, so henceforth we assume the gates’ semantics 
are respected, call the graph’s edges wires, say each splitter divides one wire in two, 
and discuss wires’ truth values instead of discussing the truth values of the gates 
at which those wires originate. 

We assume each wire in C is labelled with a unique symbol (considering a split 
to be the end of an incoming wire and the beginning of two new wires, so all three 
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Fig. 2. To construct the circuit above (computing XOR) we need to add wires a and b, split a into 
c and d, split b into e and /, add gate A, split g into h and i, and finally add gates B, C and D. 


wires have different labels). For each such symbol a, and some auxiliary symbols we 
introduce during our construction, we use as characters in our construction three 
related symbols: a itself, a and Xa- We indicate an auxiliary symbol related to a by 
writing a' or a". We write to denote j copies of Xa- We emphasize that, despite 
their visual similarity, a and a are separate characters, which play complementary 
roles in our reduction. We use $ and ^ as generic separator symbols, which we 
consider to be distinct (from each other an from all other symbols) for each use; to 
prevent confusion, we add different superscripts to their different uses within the 
same part of the construction. 

We can build a sequence Cq, ■ ■ ■ ,Ct ot subcircuits such that Cq is empty, Ct = C 
and, for 1 < i < t, we obtain Ct from Ci-i by one of the following operations (see 
Fig. [5] for an example): 

• adding a new wire (which is both an input and an output in Ci), 

• splitting an output of Ci-i into two outputs, 

• making two outputs of Ct-i the inputs of a new NAND gate. 

We will show how to build in time linear in the size of C, inductively and in 
turn, a sequence of strings Si,... ,St such that Si represents Ci according to the 
following definitions: 

Definition 1. A diverse palindromic factorization P of a string Si encodes an 
assignment r to the inputs of a circuit Ci if the following conditions hold: 

• ifr makes an output ofCi labelled a true, then a, Xa and XadXa are complete 
factors in P but d, XaUXa and are not for j > 1; 

• ifr makes an output of Ci labelled a false, then d, Xa and XaaXa are com¬ 
plete factors in P but a, XadXa and xf are not for j > 1; 

• if a is a label in C but not in Ci, then none of a, d, XaaXa, XadXa and x^ 
for j > 1 are complete factors in P. 

We say “complete factor” to emphasize the difference between factors in the 
factorization and their proper substrings; unfortunately, “factor” is sometimes used 
in the literature as a synonym for “substring”. 
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Definition 2. A string Si represents a circuit Ci if each assignment to the inputs 
of Ci is encoded by some diverse palindromic factorization of Si, and each diverse 
palindromic factorization of Si encodes some assignment to the inputs of Ci. 

Once we have St, we can easily build in constant time a string S that has 
a diverse palindromic factorization if and only if C is satisfiable. To do this, we 
append %ffxaaxa to St, where $ and ff are symbols not occurring in St and a is 
the label on C’s output. Since $ and ff do not occur in St and occur as a pair of 
consecutive characters in S, they must each be complete factors in any palindromic 
factorization of S. It follows that there is a diverse palindromic factorization of S if 
and only if there is a diverse palindromic factorization of St in which XaaXa is not 
a factor, which is the case if and only if there is an assignment to the inputs of C 
that makes its output true. 

3. Adding a Wire 

Suppose Ci is obtained from Ci-i by adding a new wire labelled a. If t = 1 then 
we set Si = XattXadXa, whose two diverse palindromic factorizations (xa, a, XadXa) 
and {xaOXa, d, Xa) encode the assignments true and false to the wire labelled a, 
which is both the input and output in If i > 1 then we set 

Si — Si — i XaOX^aXa , 

where $ and ff are symbols not occurring in Si-i and not equal to a', a' or Xa' for 
any label a' in C . 

Since $ and ff do not occur in Si-i and occur as a pair of consecutive char¬ 
acters in Si, they must each be complete factors in any palindromic factorization 
of Si- Therefore, any diverse palindromic factorization of Si is the concatenation 
of a diverse palindromic factorization of Si-i and either ($, ff, Xa, a, XadXa) or 
($, ff, XattXa, d, Xa)- Conversely, any diverse palindromic factorization of Si-i 
can be extended to a diverse palindromic factorization of Si by appending either 

($, #, Xa, a, XadXa) Or ($, #, XattXa, d, Xa)- 

Assume Si-i represents Ci-i- Let r be an assignment to the inputs of Ci and let 
P be a diverse palindromic factorization of Si-i encoding r restricted to the inputs 
of Ci-i - If T makes the input (and output) of Ci labelled a true, then P concatenated 
with ($, ff, Xa, a, XadXa) is a diverse palindromic factorization of Si that encodes 
T. If T makes that input false, then P concatenated with ($, ff, XaOXa, d, Xa) is a 
diverse palindromic factorization of Si that encodes t. Therefore, each assignment 
to the inputs of Ci is encoded by some diverse palindromic factorization of Si- 

Now let P be a diverse palindromic factorization of Si and let r be the as¬ 
signment to the inputs of Ci-i that is encoded by a prefix of P. If P ends with 
($, ff, Xa, a, XadXa) then P encodes the assignment to the inputs of Ci that makes 
the input labelled a true and makes the other inputs true or false according to r. 
If P ends with ($, ff, XaOXa, d, Xa) then P encodes the assignment to the inputs 
of Ci that makes the input labelled a false and makes the other inputs true or false 
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according to r. Therefore, each diverse palindromic factorization of Si encodes some 
assignment to the inputs of Ci. 

Lemma 3. We can build a string Si that represents Ci. If we have a string Si-i 
that represents Ct-i and Ci is obtained from Ci-i by adding a new wire, then in 
constant time we can append symbols to Si-i to obtain a string Si that represents Ci. 


4. Splitting a Wire 


Now suppose Ci is obtained from Ci-i by splitting an output of Ci-i labelled a 
into two outputs labelled b and c. We set 

S- = 5'i_i $#xlb'xaaxac'xl $'#' xlb'XadXaC'xl , 

where $, ff, b', b', c' and d are symbols not occurring in Si-i and not equal 
to o', a' or Xa' for any label a' in C. 

Since $, ff and ff' do not occur in Si-i and occur as pairs of consecutive 
characters in S'', they must each be complete factors in any palindromic factoriza¬ 
tion of S'. Therefore, a simple case analysis shows that any diverse palindromic 
factorization of S' is the concatenation of a diverse palindromic factorization of 


and 

one 

of 














($, 

#, 

X® 

6', XattX 

■a} 

c', 

X® 

S', 

#', 

x^ 

x'‘ 

a^o 

b'Xa, d. 

Xa 

c'Xa, 


(S, 

#, 

X® 

b', XattX 

a 5 

c', 

X® 

S', 

#', 

x'‘ 

x^ 

Xa 

b'Xa, d. 

Xa 

c'Xa, 

^a) 

(S, 

#, 

X® 

b', XattX 

■a 5 

c', 

X® 

S', 

#', 

xl, 

Xab'Xa 

, d, Xad 

Xa 

, 


(S, 

#, 

x^ 

Xab'Xa, 

a, 


c'Xo, 

xt 

, S', 

#', 

X^ 

w 

, XaO/Xa-) 

d 

, xl, 

a^a) 

(S, 

#, 

x^ 

Xab'Xa, 

a, 


c'Xo, 

x^ 

, S', 

#', 

x^ 

w 

f XaOjXaf 

d 

, xl. 


($, 

#, 

x^ 

Xab'Xa, 

a, 


c'Xo, 


, S', 

#', 

x^ 

v 

, XaClXa-i 

d 

, 



In any diverse palindromic factorization of S', therefore, either b' and c' are complete 
factors but b' and c' are not, or vice versa. 

Conversely, any diverse palindromic factorization of Si_i in which a, Xa and 
XadXa are complete factors but d, XaOXa and xf are not for j > 1, can be extended 
to a diverse palindromic factorization of S' by appending either of 

($, #, xl, b', XaaXa, c', X®, $', #', xl, xl, Xab'Xa, d, XaC'Xa, X®), 

($, #, X®, 6', XattXa, c', X®, $', #', X®, Xab'Xa, d, XaC'Xa, X®); 


any diverse palindromic factorization of Si_i in which d, Xa and XaOXa are com¬ 
plete factors but a, XadXa and x^ are not for j > 1, can be extended to a diverse 
palindromic factorization of S' by appending either of 

($, #, xl, Xab'Xa, a, XaC'Xa, X^, $', #', X^, 5', XadXa, c', X®, X®), 

($, #, xl, Xab'Xa, a, XaC'Xa, X^, $', #', X^ , 5', XadXo, c', X®). 


We set 


Si = S'i Xhbxhb'Xhb'xtbxt XcCXcc'xcC'XcCXc , 
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where #" and are symbols not occurring in S'- and not equal to a', a' 

or Xa' for any label a' in C. Since $", #" and do not occur in S'- and occur 

as pairs of consecutive characters in S'', they must each be complete factors in any 
palindromic factorization of Si. Therefore, any diverse palindromic factorization of 
Si is the concatenation of a diverse palindromic factorization of S' and one of 

($", Xb, 5, Xbb'xb, b', Xbbxb, #'", x^, c, x^Sx^, c', XcCXc), 

($", Xbbxb, b', Xbb'xb, b, Xb, $'", XcCXc, c, Xcc'Xc, c, Xc) ■ 

Conversely, any diverse palindromic factorization of S' in which b' and c' are 
complete factors but b' and d are not, can be extended to a diverse palindromic 
factorization of Si by appending 

($", Xb, b, Xbb'xb, b', Xbbxb, $'", Xc, c, Xcc'xc, d, XcCXc ); 

any diverse palindromic factorization of S' in which b' and d are complete factors 
but b' and d are not, can be extended to a diverse palindromic factorization of Si 
by appending 

($", Xbbxb, b', Xbb'xb, b, xt, $'", XcCXc, c', Xcdxc, c, Xc) ■ 


Assume Si_i represents Ci-i. Let r be an assignment to the inputs of Ci-i 
and let P be a diverse palindromic factorization of Si_i encoding r. If r makes the 
output of Ci-i labelled a true, then P concatenated with, e.g., 

($, #, xl, b', XaaXa, c', X^, $', #', xl, xl, Xab'Xa, d, XadXa, X®, 
^7^5 X^, b, Xbb Xb, b , XbbXb, $ 7 ^ 7 Xc, C, XcC Xc, C , XcCXc') 


is a diverse palindromic factorization of Si. Notice 5, c, Xb, Xc, Xbbxb and XcCXc are 
complete factors but b, c, Xbbxb, XcCXc, x^ and x^ for j > I are not. Therefore, this 
concatenation encodes the assignment to the inputs of Cj that makes them true or 
false according to r. 

If T makes the output of Ci-i labelled a false, then P concatenated with, e.g.. 


($7 #7 


^b'Xa, a, XaC'Xa, X^, $', #', X^, 6', XoOXa, c', X^ , X® , 


$", Xbbxb, b', Xbb'xb, b, Xb, #'", XcCXc, c', XcdXc, c, Xc) 

is a diverse palindromic factorization of Si. Notice b, c, Xb, Xc, Xbbxt and XcCXc are 
complete factors but 6, c, Xbbxb, XcCXc, x^ and x^ for j > I are not. Therefore, this 
concatenation encodes the assignment to the inputs of Ci that makes them true or 
false according to r. Since Ci-i and Ci have the same inputs, each assignment to 
the inputs of Ci is encoded by some diverse palindromic factorization of Si. 

Now let P be a diverse palindromic factorization of Si and let t be the assign¬ 
ment to the inputs of Ci-i that is encoded by a prefix of P. If P ends with any 
of 


(S, 

#7 

xl, b 

, XaG'Xa-! 

c', 

X® 

S', 


x^ 

xl, Xab'Xa, a, XadXa, 

xl) 

(S, 

#7 

xl, b 

, XaQ'Xa-i 

c', 

x^ 

S', 

#', 

x'‘ 

xl, Xab'Xa, a, XadXa, 

xl) 

(S, 

#7 

xl, b 

, XdClXdj 

c', 

x^ 

S', 

#', 

X® 

Xab'Xa, a, XadXa, X®) 
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followed by 

($", Xb, b, Xbb'xb, b', Xbbxb, x^, c, Xcc’xc, c', x^cxc) , 

then a must be a complete factor in the prefix of P encoding t, so t must make 
the output of Ci-i labelled a true. Since b, c, Xb, Xc, Xbbxb and XcCXc are complete 
factors in P but 6, c, Xbbxb, XcCXc, xl and x{ for j > 1 are not, P encodes the 
assignment to the inputs of Ci that makes them true or false according to t. 

If P ends with any of 


($, 

#, 


Xab'Xa, 

Xq,, 

*^05 

S', 

#', 

x^ 

F, 

XadXa, 

7, 


xl), 

($, 

#, 

x^ 

Xab'Xa, 

XfxC Xa: 

x^ 

S', 



V, 

XadXa, 

c'. 

xl, 


($, 

#, 

x^ 

Xab'Xa, 


x'^ 

S', 


x'’ 

F, 

XadXa, 

7, 

xl) 



followed by 

($", Xbbxb, b', Xbb'Xb, b, Xb, $'", XcCXc, c', Xcc'Xc, c, Xc) , 

then a must be a complete factor in the prefix of P encoding r, so r must make 
the output of Ci-i labelled a false. Since b, c, Xb, Xc, Xbbxb and XcCXc are complete 
factors but b, c, Xbbxb, XcCXc, xl and x^ for j > 1 are not, P encodes the assignment 
to the inputs of Ci that makes them true or false according to r. 

Since these are all the possibilities for how P can end, each diverse palindromic 
factorization of Si encodes some assignment to the inputs of Ci. This gives us the 
following lemma: 

Lemma 4. If we have a string Si-i that represents Ci-i and Ci is obtained from 
Ci-i by splitting an output of Ci-i into two outputs, then in constant time we can 
append symbols to Si-i to obtain a string Si that represents Ci. 

5. Adding a NAND Gate 

Finally, suppose Ci is obtained from Ci-i by making two outputs of Ci-i labelled 
a and b the inputs of a new NAND gate whose output is labelled c. Let C''_^ be 
the circuit obtained from Ci-i by splitting the output of Ci-i labelled a into two 
outputs labelled ai and 02 , where ai and 02 are symbols we use only here. Assuming 
Si-i represents Ci-i, we can use Lemma|4]to build in constant time a string S'i_i 
representing C'i_i. We set 

S'i = Sl_i xl,a[xc'aiXc'OiXc'a[x% 

$'#' xl,a2Xc'a2Xc'a2Xci 
$"#" x]}b'Xc'bXc'bxc'Vx]^ , 

where all of the symbols in the suffix after S'i_i are ones we use only here. 

Since $, $', $", $'", # and #' do not occur in Si-i and occur as pairs of con¬ 
secutive characters in S'i, they must each be complete factors in any palindromic 
factorization of S''. Therefore, any diverse palindromic factorization of S' consists 
of 
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(1) a diverse palindromic factorization of 

( 2 ) ($, #), _ 

(3) a diverse palindromic factorization of x^,a'^Xc'aiXc'OiXc'a[x^,, 

(4) ($', #'), _ _ 

(5) a diverse palindromic factorization of xl,a 2 Xc'a 2 Xc'a 2 Xc'a 2 X^,, 

( 6 ) ($", #"), 

(7) a diverse palindromic factorization of x]}b'Xc'hxc'bxc'b’x]^. 

If oi is a complete factor in the factorization of then the diverse palin¬ 

dromic factorization of 

x\, a'l Xc' aiXc'CLiXc' a'^ x\, 

must include either 

(a'l, Xc'axXc'., W, Xc'a[xc') or (a'l, Xc'aiXc', W, Xd, a[). 

Notice that in the former case, the factorization need not contain Xc'- If oi is a com¬ 
plete factor in the factorization of S''_i, then the diverse palindromic factorization 
of 

x% a'lXc'aiXc'OiXc' a'^x^ 

must include either 


{xc'a'iXc', oi, Xc'ttiXc', a'l) or (a'l, Xc', ai, Xc'aiXc', a[). 

Again, in the former case, the factorization need not contain Xc' ■ Symmetric propo¬ 
sitions hold for 02 and b. 

We set 


S'" = S' xl?a[xc'c'xc'b'xlJ x],?a 2 Xc'dxc'b'x^} , 

where $'^, c' and d are symbols we use only here. Any diverse palin¬ 

dromic factorization of S'' consists of 


(1) a diverse palindromic factorization of S', 

(2) ($n #t), _ 

(3) a diverse palindromic factorization of x\f a'^Xc'c'Xc'Vx^J, 

(4) ($tt, #tt), _ 

(5) a diverse palindromic factorization of x]j a 2 Xc'dXc'b'x^}. 

Since ai and a 2 label outputs in C''_i split from the same output in Ci-i, it 
follows that ai is a complete factor in a diverse palindromic factorization of S'_i if 
and only if 02 is. Therefore, we need consider only four cases: 

Case 1: The factorization of S'_i includes ai, a 2 and b as complete factors, so 
the factorization of S' includes as complete factors either Xc>a'iXc', or a'l and Xc'', 
either Xc'a^Xc', or 02 and Xc'; either Xc'b’xc', or b' and Xc'] and 6'. Trying all the 
combinations — there are only four, since Xc' can appear as a complete factor at 
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most once — shows that any diverse palindromic factorization of S” includes one 
of 


{a[, Xc’c'xc', b', 02 , Xc', d, Xc'b'xc'), 

(o^, Xq'C Xq/^ 5 , • ■ •, Xq'(X‘ 2^c'•) d^ x^'h Xq'^ , 
with the latter only possible if Xc> appears earlier in the factorization. 

Case 2: The factorization of 5''_2 includes oi, 02 and b as complete factors, so 
the factorization of S'' includes as complete factors either Xc'a[Xc', or a'^ and Xc'] 
either Xc’oJ^^c'^ or O 2 and Xc'; b'\ and either Xc'b'xc', or b' and Xc'- Trying all the 
combinations shows that any diverse palindromic factorization of S'' includes one 
of 

{a[, Xc', c', Xc'b'xc', 02 , Xc'dxc', b'), 

{xc>a[xc', c, Xc'b'xc', ..., 02 , Xc'dxc', b'), 
with the latter only possible if Xc> appears earlier in the factorization. 

Case 3: The factorization of S '_2 includes TTi, 02 and b as complete factors, so the 
factorization of S' includes as complete factors a[; a' 2 ] either Xc'b'xc', or b' and Xc'; 
and 5'. Trying all the combinations shows that any diverse palindromic factorization 
of S" includes one of 

{xc>a[xc>, c', Xc>, V, ..., Xc'a’^Xc', d, Xc'b'xc’), 

i^Xc' Cl-^Xc', C , Xc'b Xc', • ■ • , Xc' CI 2 XC', d, Xc'b Xc'') , 

with the latter only possible if Xc' appears earlier in the factorization. 

Case 4: The factorization of Sl_i includes Tii, 02 and b as complete factors, so the 
factorization of S' includes as complete factors a \; ; 6'; and either Xc' b'xc >, or b' and 

Xc' ■ Trying all the combinations shows that any diverse palindromic factorization 
of S" that extends the factorization of S' includes one of 

i^Xc' Cl-^Xc', C , Xc'b Xc', • ' • , Xc' CI 2 XC', d, Xc', b ), 

{xc'a[xc', c', Xc'b'Xc’, ..., Xc'a' 2 Xc', d, Xc'b'xc'), 

with the latter only possible if Xc> appears earlier in the factorization. 

Summing up, any diverse palindromic factorization of S" always includes Xc' 
and includes either Xc'c'xc' if the factorization of S)_i includes oi, 02 and b as 
complete factors, or c' otherwise. 

We set 


= S" Xcfc"Xc'c'Xc'C'Xc'C"X, 


77/^25 


where and are symbols we use only here. Any diverse palindromic factor¬ 
ization of S'" consists of 


(1) a diverse palindromic factorization of S", 

( 2 ) 
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(3) a diverse palindromic factorization of c"Xc' c'Xc'dXc' c". 


Since Xc' must appear as a complete factor in the factorization of 5", if d is a 
complete factor in the factorization of S'", then the factorization of 


x^Jd'xr 


'CXn' c X. 


,25 


must include 


(c , Xq'C Xq' C , Xc'C Xc'') , 

otherwise, it must include 

{Xc’d'Xc', c', Xc'dXc', c"). 

That is, the factorization of xj^d'Xc'dXc'dXc'd'x^ includes c", Xc' and Xc'd'Xc' 
but not c" or Xc'd'xc', if and only if the factorization of S" includes c'; otherwise, 
it includes c", Xd and Xc'd'xd but not c" or Xc'd'xc'- 
We set 

Si = S'" XcCXcd'Xcd'xjixc , 

where $^, c, c and Xc are symbols that do not appear in S"'. Any diverse 
palindromic factorization of Si consists of 

(1) a diverse palindromic factorization of S'", 

(2) ($t, #), _ 

(3) a diverse palindromic factorization of XcCXcd'Xcd'XcCXc- 

Since exactly one of c" and c" must appear as a complete factor in the factor¬ 
ization of S'", the factorization of 

// 77 — 

rv* rv* * nn 

4Xj (2.C-Ju c.C' tJUQLy Jb qC-JSQ 


must be either 


{Xc, c, Xcd'Xc, c", XcCxd) 


or 


{XcC-Xc-i C , XqC Xcj C, Xc )■ 

Thus if c" is a complete factor in the factorization of S'", then c, Xc and XcCXc are 
complete factors in the factorization of Si but c, XcCXc and x^ are not for j > 1; 
otherwise, c, Xc and XcCXc are complete factors but c, XcCXc and x^ are not for 

j > 1- 

Assume Si_i represents Ci-i. Let r be an assignment to the inputs of Ci-i and 
let P be a diverse palindromic factorization of Si_i encoding r. By Lemma |4] we 
can extend P to P' so that it encodes the assignment to the inputs of C''_i that 
makes them true or false according to r. There are four cases to consider: 
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Case 1: t makes the outputs of Ci-i labelled a and b both true. Then P' concate¬ 
nated with, e.g., 

($, #, xl,, Xc>aiXc', Xc'a^Xc', x%, 

^ ^ 5 ^c'') ^ 2 ’ ^c' j 0^21 Xc' Ct2Xc' ^ ^c'') 

$", xl}, b', Xc'bxc', b, Xc'Vxc’, xl?) 
is a diverse palindromic factorization P” of S'' which, concatenated with, e.g., 

(S'l’, #■'■, xl^, a[, Xc'c'xc', V, xlJ, 

S''"'', x^J, 02 , Xc', d, Xc'b’xc', xl?) 
is a diverse palindromic factorization P'" of S" which, concatenated with, e.g., 

3 , 22 ^ Xc'c"Xc', c', Xc'C'Xc', C", X^f) 

is a diverse palindromic factorization pi of S"' which, concatenated with 
($^, #^, XcCXc, c", XcC"Xc, c, Xc) 

is a diverse palindromic factorization P^ of Si in which c, Xc and XcCXc are complete 
factors but c, XcCXc and xi are not for j > 1 . 

Case 2: r makes the output of Ci-i labelled a true but the output labelled b false. 
Then P' concatenated with, e.g., 

(^7 ^7 7 ^l7 0'l7 Xc' O^-^X^f j ^ q'1 

$', X^,, 02 , Xc'a2Xc', 02, Xc'O^Xc', X% , 

$", xl?, Xc'b'xc', b, Xc'bxc', b', x]j) 

is a diverse palindromic factorization P" of S' which, concatenated with, e.g., 

(S'l', #■'■, xl?, a[, Xc', c', Xc'b'Xc', xl?, 
xl?, a' 2 , Xc'dxc’, b', xD) 

is a diverse palindromic factorization P'" of S" which, concatenated with, e.g., 

3 , 23 ^ Xc'c'xc’, d, Xc'd’Xc', x^}) 

is a diverse palindromic factorization P^ of S'" which, concatenated with 

($^ Xc, C, Xcd'Xc, C", XcCXc) 

is a diverse palindromic factorization P^ of Si in which c, XcCXc and Xc are complete 
factors but c, XcCXc and xl are not for j > 1 . 


Case 3: t makes the output of Ci-i labelled o false but the output labelled b true. 
Then P' concatenated with, e.g.. 


(S, 

, Xc' d-yXc' 

7 ai7 

Xc' 

■OlXc', 

a'l, xl, 

S '7 

^ , X^f^ Xc'0'2^c‘ 

'7 02, 

Xc 

ia2Xc’, 

a' 2 , xl 

$", 

xl}, b', Xc' 

bxc'. 

h. 

Xc'b'Xc 

xl?) 
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is a diverse palindromic factorization P" of which, concatenated with, e.g., 

(^'^5 ^'^5 ^c' ’ Xq' dlXc' ') C , Xc'J b , X^/ , 

$'^' 1 ', xlf, Xc'a 2 Xc', d, Xc'b'xc’, x^j') 

is a diverse palindromic factorization P'" of S'" which, concatenated with, e.g., 

3-23^ ^ Xc'CXc', C', Xc'd'Xc', X^j) 

is a diverse palindromic factorization pl of S'" which, concatenated with 
($^ Xc, C, XcC"Xc, c", XcCXc) 

is a diverse palindromic factorization of Si in which c, XcCXc and Xc are complete 
factors but c, XcCXc and xi are not for j > 1. 


Case 4: t makes the outputs of Ci-i labelled a and b both false. Then P' concate¬ 
nated with, e.g.. 


(S, 

#, Xc'a[xc', 

ai, 

Xc' 

alXc' 

, o'l, xl, 

S', 

^ , Xc'Cl2Xc'- 

. 02, 

Xc 

'WiXc' 

, 02, xl, 

$", 

# , Xc> , Xc'bXc' 

■, b. 

Xc' 

bxc'. 

¥, xl?) 


is a diverse palindromic factorization P" of S' which, concatenated with, e.g., 

(S'l’, xlf, Xc>a[xc', c', Xc'b'xc’, xlf, 

$■''■ 1 ', , xlf, Xc'a’^Xc', d, Xc', b', x^}) 

is a diverse palindromic factorization P'" of S" which, concatenated with, e.g., 

3-23^ Xc'c'Xc', d, Xc'd’Xc', X^J^) 

is a diverse palindromic factorization pl of S'" which, concatenated with 
($^, #^, Xc, c, Xcc'xc, c", XcCXc) 

is a diverse palindromic factorization P^ of Si in which c, XcCXc and Xc are complete 
factors but c, XcCXc and x^ are not for j > 1. 


Notice that in all cases P^ encodes the assignment to the inputs of Ci that makes 
them true or false according to r. Since Ci-i and Ci have the same inputs, each 
assignment to the inputs of Ci is encoded by some diverse palindromic factorization 
of Si. 

Now let P be a diverse palindromic factorization of Si and let r be the assign¬ 
ment to the inputs of Ci-i that is encoded by a prefix of P. Let P be a diverse 
palindromic factorization of S'_ 3 . Since oi and 02 are obtained by splitting a in 
Si_i, it follows that oi is a complete factor of P if and only if 02 is. Therefore, 
in what follows we only consider any diverse palindromic factorization P of Si in 
which either both oi and 02 are complete factors, or neither oi nor 02 is a complete 
factor. 
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Let P' be the prefix of P that is a diverse palindromic factorization of S'/'. 
Case A: Suppose the factorization of 


23 // 

x^> c Xr 


'd’xj 


in P' includes c" as a complete factor, which is the case if and only if P includes c, 
Xc and XcCXc as complete factors but not c, XcCXc and x^ for j > 1. We will show 
that T must make the outputs of Ci-i labelled a and b true. Let P" be the prefix 
of P' that is a diverse palindromic factorization of S'^. Since c" is a complete factor 
in the factorization of 

23 // / ~7 “77 25 

X^, C Xc'C Xc'CXc'C'x^, 


in P', so is c'. Therefore, c' is not a complete factor in the factorization of 

x]j a'lXc'c'xc’b'x]] 

in P", so a'l and b' are. 

Let P'" be the prefix of P" that is a diverse palindromic factorization of S[. 
Since a'^ and b' are complete factors later in P", they are not complete factors in 
P'". Therefore, 'di and b are complete factors in the factorizations of 

x^ia'iXc'aiXc'diXc'a'iX^ and x]}b'xc'bxc'bxc'b'x]^ 

in P'", so they are not complete factors in the prefix pl of P that is a diverse 
palindromic factorization of S[_^. Since we built from Si-i with Lemma 01 
it follows that ai and b are complete factors in the prefix of P that encodes r. 
Therefore, r makes the outputs of Ci-i labelled a and b true. 

Case B: Suppose the factorization of 

,P'v 

C tXj (2/ Ky Jb (y ^ q! 


in P' does not include c" as a complete factor, which implies that it does include 
Xc'd'xc' as a complete factor. Since, as noted earlier, we can assume that oi is a 
complete factor of P if and only if 02 is, it follows that the factorization of 


x^f c"xc’ c'xc' dxc’ d'x^f 


must include 


! n I “77 ^ 

(C ,Xc'C Xc^d.Xc'd'Xc')- 

Then, P must include Xc, c and c" as complete factors. We will show that r must 
make at least one of the outputs of Ci-i labelled a or b false. Let P" be the prefix 
of P' that is a diverse palindromic factorization of S'". Since Xc'dXc' is a complete 
factor in the factorization of 


JXi" J I 

tiy C ct/ ^' C tXj/ y y C tL 


in P', d is a complete factor in the factorization of 

X J? a'^ Xc' c'xc' b'x^J 
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in P". Then, the factorization of 

x]^ a[ Xc' dxc' b'x]J 

must include one of the following three: 

i^Xc' (X^Xc ', C j Xf^'b Xc' ) , (1) 

(xc’a[Xc',c',Xc>,b'), ( 2 ) 

{a[,Xc',c\xc'b'xc’). (3) 

Case B-a: Assume the factorization of x]ja'iXc'c'Xc'b'x]J includes ([T]). Let P'" be 
the prefix of P" that is a diverse palindromic factorization of S'^. Since a'l 
and b' are not complete factors later in P", they are complete factors in 
P"'. Therefore, there are five combinations of factorizations of 

x%a'iXc'aiXc''aiXc'a'iX% and x\}b' Xc'bxc'bxc'Vx^ 

in P"', as follows: 

Case B-al: The factorizations include 

{xc'CLiXc', CLi, Xc'OLiXc', a'^) and (xc'b'xc', b, Xc'bXc', b'). 

In this case, oi and b are not complete factors in the prefix of P that 
encodes r. Therefore, t makes both the outputs of Ci-i labelled a and 
b false. 

Case B-a2: The factorizations include 

{Xc'a'iXc'i ai, Xc'OLiXc', a'^) and {b', Xc'bXc', b, Xc', b'). 

In this case, oi is not a complete factor and 6 is a complete factor in 
the prefix of P that encodes r. Therefore, r makes the outputs of Ci-i 
labelled a false and b true. 

Case B-a3: The factorizations include 

{a[, Xc'ttiXc', aT, Xc', a{) and (xc'b'xc’, b, Xc'bxc', b'). 

In this case, oi is a complete factor and b is not a complete factor in 
the prefix of P that encodes r. Therefore, r makes the outputs of Ci-i 
labelled a true and b false. 

Case B-a4: The factorizations include 

{a[, Xc', oi, Xc'oTxc', a'l) and (xc'b'xc', b, Xc'bxc', b'). 

In this case, oi and b are not complete factors in the prefix of P that 
encodes r. Therefore, r makes both the outputs of Ct-i labelled a and 
b false. 

Case B-a5: The factorizations include 

{Xc'a[xc>, Ui, Xc’OiXc', a[) and (6', Xc', &, Xc’bXc>, b'). 
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In this case, ai and b are not complete factors in the prefix of P that 
encodes r. Therefore, r makes both the outputs of Ci-i labelled a and 
b false. 

Case B-b: Assume the factorization of x]j a\xc'c'Xc'h'x^J includes ([2|). Let P" be 
the prefix of P' that is a diverse palindromic factorization of S'/. Let P'" be 
the prefix of P" that is a diverse palindromic factorization of S''. Since a'^ 
and Xc'b'Xc' are not complete factors later in P", they are complete factors 
in P'". Therefore, the factorizations of 

x^^ia'iXc'aiXc’CLiXc'a'ix/ and x\}b'xc'hxc'hxc'b'x^ 
must include 

{xc'a’iXc', oi, Xc'CLiXc', a'^) and (6', Xc>bxc>, b, Xc'b’Xc') 

in P'". Then ui is not a complete factor and 6 is a complete factor in the 
prefix of P that encodes r. Therefore, r makes the outputs of Ci-i labelled 
a false and b true. 

Case B-c: Assume the factorization of x].9a'iXc'c'xc'b'x]J includes (jS]). Let P" be 
the prefix of P' that is a diverse palindromic factorization of S". Let P'" 
be the prefix of P" that is a diverse palindromic factorization of S'. Since 
Xc'a’iXc' and b' are not complete factors later in P", they are complete 
factors in P'". Therefore, the factorizations of 

x^ia'iXc’aiXc'CLiXc'a'ix/ and x\}b'xc'bxc'bxc'b'x^ 
must include 

{a'l, Xc'aiXc', cLi, Xc'a'^Xc') and {xc'b'xc', b, Xc'bxc', b') 

in P'". Then ui is a complete factor and b is not a complete factor in the 
prefix of P that encodes r. Therefore, r makes the outputs of Ci-i labelled 
a true and b false. 

The above arguments give the following lemma. 

Lemma 5. If we have a string Si-i that represents Ci-i and Ci is obtained from 
Ci-i by making two outputs of Ci-i the inputs of a new NAND gate, then in 
constant time we can append symbols to Si-i to obtain a string Si that represents Ci. 

6. Summing Up 

By Lemmas [H [4] and [5] and induction, given a Boolean circuit C composed only of 
splitters and NAND gates with two inputs and one output, in time linear in the size 
of C we can build, inductively and in turn, a sequence of strings Si,..., S* such that 
Si represents Ci. As mentioned in Section [SJ once we have St we can easily build in 
constant time a string S that has a diverse palindromic factorization if and only if 
C is satisfiable. Therefore, diverse palindromic factorization is NP-hard. Since it is 
obviously in NP, we have the following theorem: 
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Theorem 6. Diverse palindromic factorization is NP-complete. 

7. fc-Diverse Factorization 

It is not difficult to check that our reduction is still correct even if factors of the 
forms $, ff and for j > 1 can appear arbitrarily often in the factorization, as 
long as factors of the forms a, x and xax can each appear at most once. (By “of the 
form” we mean equal up to subscripts, bars and superscripts apart from exponents; 
a stands for any letter except a;.) It follows that it is still NP-complete to decide for 
any fixed k whether a string can be factored into palindromes that each appear at 
most k times in the factorization. 

Suppose we are given k and a Boolean circuit C composed only of splitters and 
NAND gates with two inputs and one output. In linear time we can build, as we 
have described, a string S such that S has a diverse palindromic factorization if 
and only if C is satisfiable. In linear time we can then build a string T as follows: 
we start with T equal to the empty string; for each substring of S of the form a, we 
append to T a substring of the form 

$i#i a$ 2#2 a $ 3#3 • • • a $kffk , 

where $i,..., $fe, ffi, ..., ffk are symbols we use only here; for each substring of S 
of the form x, we append to T a substring of the form 

X $2#2 * ^37^3 ■ ■ ■ I 

where #i,are symbols we use only here; for each substring of S 

of the form xax, we append to T a substring of the form 

$"#'/ xax $"#" xax $"#" ■ • ■ K-i*k-i ^ax $"#" , 

where $",..., ..., ff'l are symbols we use only here. 

Notice that the only /c-diverse palindromic factorization of T includes each sub¬ 
string of S of the forms a, x and xax exactly k — 1 times each. In particular, any 
substring of T of the form xax cannot be factored into (cc, a, x), because x must 
appear k — 1 times elsewhere in the factorization. Therefore, there is a fc-diverse 
palindromic factorization of S T, where $ and ff are symbols we use only here, 
if and only if there is a diverse palindromic factorization of S and, thus, if and only 
if C is satisfiable. This implies the following generalization of Theorem [51 

Theorem 7. For any fixed k > 1, k-diverse palindromic factorization is NP- 
complete. 

8. Binary Alphabet 

The reduction described above involves multiple distinct symbols for each compo¬ 
nent of the circuit and thus requires an unbounded alphabet, but we will next show 
that a binary alphabet is sufficient. 
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Let S be an arbitrary string and let E be the set of distinct symbols occurring 
in S. Let 5 be an (arbitrary) bijective mapping d : E — {ba^h : i G [l..|S|]}. We will 
also use 5 to denote the implied mapping from E* to {a, 5}* defined recursively by 
S{Xa) = S{X) ■ i5(q;) for any X G E* and a G E. 

Notice that 6 preserves palindromes, i.e., for any palindrome P G E*, S{P) 
is a palindrome too. Thus, if P = (Pi, P 2 , • ■ •, Pfc) is a palindromic factorization 
of S, then (5(P) = {S{Pi), S{P 2 ), ■ ■ ■, S{Pk)) is a palindromic factorization of S{S). 
Furthermore any palindrome in S{S) of the form (ba'^b)'^ must be a preserved 
palindrome, i.e., an image 6{P) of a palindrome P occurring in S. Any palindromic 
factorization of S{S) consisting of preserved palindromes only corresponds to a 
palindromic factorization of S. We call this a preserved palindromic factorization of 
S{S). Notice that a preserved palindromic factorization i5(P) is diverse if and only 
if P is diverse. 

Now consider an arbitrary non-preserved palindromic factorization of S{S). It 
is easy to see that the first palindrome must be either a single 5 or a preserved 
palindrome. Furthermore, any palindrome following a preserved palindrome in the 
factorization must be either a single 6 or a preserved palindrome. Thus the palin¬ 
dromic factorization of S{S) begins with a (possibly empty) sequence of preserved 
palindromes followed by a single b. A symmetric argument shows that the factoriza¬ 
tion also ends with a (possibly empty) sequence of preserved palindromes preceded 
by a single b. The two single b’s cannot be the same b since one is the first b in an 
image of a symbol in S', and the other is a last b. Thus a non-preserved palindromic 
factorization can never be diverse. 

The above discussion proves the following lemma. 

Lemma 8. For any string S, S{S) has a diverse palindromic factorization if and 
only if S has a diverse palindromic factorization. 

Applying the lemma to the string S constructed from a Boolean circuit C as 
described in Sections [3l [4] and [5l shows that d{S) has a diverse palindromic factor¬ 
ization if and only if C is satisfiable. Since S{S) can be constructed in time quadratic 
in the size of C, we have a binary alphabet version of Theorem [6l 

Theorem 9. Diverse palindromic factorization of binary strings is NP-complete. 

If we allow each factor to occur at most fc > 1 times, the above transformation 
to a binary alphabet does not work anymore, because two single 5’s is now allowed. 
However, a small modification is sufficient to correct this. First, we replace 6 with 
a bijection S' : T, ^ {ba'b : i G [3..|E| -1-2]}. Second, we append to S'{S) the string 
Qk which is a length 20k prefix of (abbaab)*. 

Let us first analyze the palindromic structure of Qk- It is easy to see that the 
only palindromes in Qk are 

a, 5, aa, bb, aba, bab, abba, and baab. 

The total length of these palindromes is 20 and thus the only possible fc-diverse 
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palindromic factorization of Qk is one where all the above palindromes appear 
exactly k times. Such factorizations exist too. For example, k copies of 

{abba, aba, bb, aa, bab, baab) 

followed by 2k single symbol palindromes is such a factorization. 

Now consider the string 5'{S)Qk- It is easy to verify that the only palindromes 
overlapping both 5'{S) and Qk are aba and bab. However, in any palindromic factor¬ 
ization containing one of them, the factorization of the remaining part of Qk together 
with the overlapping palindrome would have to contain more than k occurrences 
of some factor. Thus in any /c-diverse palindromic factorization of S'{S)Qk, there 
are no overlapping palindromes and the factorizations of 6'{S) and Qk are separate. 
Since the factorization of Qk contains k single b’s, the factorization of S'{S) cannot 
contain any single 6’s. Then, by the discussion earlier in this section, all palindromes 
in S'{S) must be preserved palindromes. 

Lemma 10. For any string S and any k > 1, the string d'{S)Qk has a k-diverse 
palindromic factorization if and only if S has a k-diverse palindromic factorization. 

Combining this with Theorem [3 we obtain the following: 

Theorem 11. For any fixed k > 1, k-diverse palindromic factorization of binary 
strings is NP-complete. 
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